Spatiotemporal information enhanced multi-feature short-term traffic flow prediction

Accurately predicting traffic flow is crucial for optimizing traffic conditions, reducing congestion, and improving travel efficiency. To explore spatiotemporal characteristics of traffic flow in depth, this study proposes the MFSTBiSGAT model. The MFSTBiSGAT model leverages graph attention networks to extract dynamic spatial features from complex road networks, and utilizes bidirectional long short-term memory networks to capture temporal correlations from both past and future time perspectives. Additionally, spatial and temporal information enhancement layers are employed to comprehensively capture traffic flow patterns. The model aims to directly extract original temporal features from traffic flow data, and utilizes the Spearman function to extract hidden spatial matrices of road networks for deeper insights into spatiotemporal characteristics. Historical traffic speed and lane occupancy data are integrated into the prediction model to reduce forecasting errors and enhance robustness. Experimental results on two real-world traffic datasets demonstrate that MFSTBiSGAT successfully extracts and captures spatiotemporal correlations in traffic networks, significantly improving prediction accuracy.


Introduction
With the improvement of China's economic level, more and more families choose to travel by private car, which not only increases the chance of traffic congestion, but also brings serious challenges to traffic management [1].Traffic flow prediction can assist citizens in planning their travel routes in advance, avoiding congested sections and saving fuel costs.For traffic management departments, the prediction results can help them formulate diversion and control strategies, reduce the probability of accidents, and ultimately achieve the goals of efficient resource utilization, reduced energy consumption, environmental protection, and alleviating travel pressure [2].The development of traffic flow prediction algorithms is mainly divided into three directions: statistical analysis theory, machine learning and deep learning.
Statistical theory analysis methods used for traffic flow prediction include Kalman filters (KF) [3], historical average method (HA) [4], autoregressive integrated moving average model (ARIMA) [5], etc.However, these models are sensitive to noise and the selection of model parameters, and they are difficult to handle the nonlinear relationships in complex data, resulting in unsatisfactory performance.Therefore, some scholars have attempted to use machine learning methods for traffic flow prediction, such as artificial neural networks (ANN) [6], knearest neighbors (KNN) [7] and support vector regression (SVR) [8], among other methods.Cui et al. applied backpropagation neural network (BP) to improve the training effect and generalization ability of the network by training the network with small error and using its weight vector as the initial value for the next network, together with adaptive learning rate and additional momentum method [9].Zhang et al. [10] used a BPNN model to predict the traffic flow during peak hours on a non-topological road segment in Beijing.These machine learning models are typically capable of handling complex data and nonlinear relationships.However, they have higher requirements for data volume and feature engineering, and there is a risk of overfitting.
With the deepening research on deep learning, researchers have begun to explore the use of deep learning [11] methods for traffic flow prediction.In the field of traffic flow prediction, commonly used deep learning models include Convolutional Neural Networks (CNN) [12], Recurrent Neural Networks (RNN) [13,14], Graph Neural Networks (GNN) [15,16] and so on.For example, Yao et al. proposed a traffic prediction method that combines CNN and Long Short-Term Memory (LSTM) to jointly model spatial and temporal dependencies [17].Xu et al. [18] introduced a new model called C-BiLSTM, which combines Convolutional Neural Networks with Bidirectional Long Short-Term Memory Networks.This model extracts local spatial features and forward-backward temporal features to better capture the spatio-temporal characteristics of traffic flow.Although CNN can automatically learn high-level features from data [19], it needs to consider the network structure selection problem when extracting spatial features of roads, and the size and number of grids need to be set manually, which has some limitations for extracting spatial features of road topology.
Compared to CNN, Graph Convolutional Networks (GCN) have better adaptability and accuracy in extracting spatial features of roads.The main reason is that GCN can leverage the road network's topological structure to perform convolutions on nodes and edges, thus extracting road spatial features.Moreover, the convolutional kernels in GCN can adaptively adjust, significantly improving the model's robustness.Based on these advantages, numerous researchers have conducted studies on traffic flow prediction by incorporating GCN.For instance, Yu et al. [20] constructed a spatio-temporal convolution block similar to a "sandwich" structure by combining graph convolution and gated temporal convolution in order to fuse spatio-temporal and spatial features, which can extract useful spatial features and basic temporal features.Guo et al. [21] proposed the ASTGCN model, which employs a stacked model structure, combining multiple STGCNs and an attention mechanism to form a multilayered structure, in order to further improve the model's accuracy and robustness.Fang et al. [22] designed and solved a neural ODE to complement the missing graph topology and unified spatial and temporal messaging, allowing for deeper graph propagation and fine-grained temporal message aggregation to characterize stable and accurate underlying spatial-temporal dynamics.Kong et al. [23] constructed a network to model relationships between stations, employing a deep clustering method based on graph neural networks to extract mobility patterns of bus stations.They designed a spatiotemporal prediction model named STGNNFormer based on Transformer to forecast bus station traffic, thereby enhancing prediction accuracy and efficiency.
Most of the aforementioned models overlook the dynamic spatial characteristics of traffic flow and only focus on extracting static spatial features.To address the issue of capturing the dynamic spatial characteristics in traffic flow, some dynamic evolution models based on GCN have appeared in recent years, such as Graph Attention Network (GAT) [24].GAT calculates attention weights between nodes and their neighboring nodes, allowing each node to update its information as a weighted average of its neighbors.By doing so, GAT can more accurately capture the relationships between nodes and edges, thereby extracting richer information about dynamic spatial characteristics.Based on this, this paper proposes the MFSTBiSGAT deep learning framework, which aims to comprehensively describe the dynamic spatial characteristics of road networks by deeply mining the spatiotemporal features of traffic flow.The specific contributions are twofold: 1. Combining GAT with BiLSTM networks and utilizing the Spearman function to delve into hidden correlations among different nodes.
2. Employing three STBiSGAT components with identical structures to concurrently handle various traffic metrics, namely traffic flow, speed, and lane occupancy.This approach models spatiotemporal dependencies comprehensively, enabling a more thorough analysis of traffic flow conditions.

Problem defintion
This paper defines the traffic network as an undirected graph G = (V, E, A), each node in the graph represents a spatiotemporal graph data, as shown in Fig 1 .Each time slice represents a spatial graph.As time progresses, it can be observed that the traffic flow of a given node is influenced not only by its historical data but also by the spatial information of adjacent nodes.
In graph where N is the number of nodes in the traffic network, E is the set of edges, and A is the adjacency matrix of graph G, A 2 [0,1] N×N .Let the historical traffic flow sequence of a node be {X t,i |t = 1, 2, 3, . .., T; i = 1, 2, 3, . .., N}, where X t,i represents the traffic flow data of node i at time t.Therefore, the short-term traffic flow prediction problem in the traffic network can be represented as Eq (1): In the equation, n represents the length of the historical time series, T represents the length of the prediction time series, and G represents the representation of the distributed node topology structure relationship.

Spatial information extraction layer
The spatial information extraction layer consists of two parts: spatial feature extraction layer and spatial information enhancement layer.The spatial feature extraction layer considers the topological structure of the road network.The input is a matrix X formed by stacking the traffic flow time series of each node, and an adjacency matrix A generated from the input data in the distance file to represent the graph.By using GAT, the explicit spatial features h t,1 of the road network are extracted.The specific formula for aggregating the weighted contributions of node j to node i and obtaining the new features is as follows: where x i and x j are the feature vectors of node i and node j respectively.e ij denotes the attention score between node i and node j, a corresponds to a single-layer feed-forward neural network used to compute the attention score between the two nodes, α ij is the normalized attention weight, W is the corresponding learning parameter, and h 0 i is the new feature representation of node i.
However, in order to enrich the expressiveness of the model, multi-head attention is introduced in GAT.The results obtained from K attention mechanisms are averaged to obtain the final new feature h i of node i, as shown in Eq (5).Similarly, the explicit spatial feature h t,1 can be obtained by calculating the new features of all nodes.
The spatial information enhancement layer considers the complex nonlinear relationships among the nodes in the road network.It utilizes the Spearman correlation coefficient to uncover hidden correlations, which helps further enhance the spatial information.
As shown in Fig 3, the heat map of the relationship between nine neighboring nodes is generated using the Spearman correlation coefficient.The numerical values in the heatmap represent the degree of correlation for each corresponding node, with larger values indicating stronger correlations.Therefore, considering the implicit relationship between spatial nodes, the Spearman correlation coefficient is used here to extract the degree of correlation of the nodes and form the corresponding correlation coefficient matrix ρ, which is used to replace the adjacency matrix A, and is input to GAT with the traffic flow matrix X t to obtain the implicit spatial features of the enhanced information h t,2 .Its Spearman correlation coefficient is calculated as follows: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Where R(x i ) and R(y i ) represent the bit of the ith data in the data of x node and y node, respectively, R x ð Þ and R y ð Þ denote the average bit in the data of x node and y node, respectively, where the bits are arranged according to the magnitude of the value of the traffic speed at each moment in time, and n denotes the total number of observed nodes.
The obtained explicit spatial features are fused with the implicit spatial features to obtain the final spatial features as follows: where σ(�) is the RELU activation function and � is the Hadamard product, which represents the multiplication of the corresponding elements of the matrix.

Temporal information extraction layer
The dynamic nature of traffic flow data indicates that short-term traffic flows exhibit strong correlations, while traffic information exhibits long-term dependencies in terms of temporal features.The Bidirectional Long Short-Term Memory (BiLSTM) network can handle both forward and backward time series data, capturing longer-term dependencies and considering the dynamic features of data in previous and subsequent time periods.Therefore, BiLSTM is chosen in this paper to extract the temporal features of the traffic road network.The previously extracted spatial features ĥt;1 are used as input, and the forward hidden state ht;2 and backward hidden state h ~ t ;2 of the BiLSTM network are computed using Eqs ( 8) and ( 9), through the LSTM network.When both directions of the LSTM are trained, the outputs of the bi-directional hidden states are linked into new features ĥt;2 with spatio-temporal correlations via Eq (10) as follows:

Temporal information enhancement layer
In order to prevent each highway from overly relying on surrounding road information while maintaining its own personalized time series features, inspired by the ResNet network structure, BiLSTM is specifically used to extract the temporal features of the original input data, enriching the temporal information of traffic flow.The specific process is similar to the previous time feature extraction, but here the original traffic flow matrix X t is used as input, and the enhanced temporal features ĥt;3 are obtained through the BiLSTM network.The specific formula is as follows:

Prediction layer
The prediction layer of this model consists of a fully connected network.It concatenates the spatiotemporal features ĥt;2 extracted from the previous layers with the temporal features ĥt;3 extracted from the temporal information enhancement layer, resulting in the feature ĥt .The fully connected network assigns different weights to ĥt and finally outputs the predicted traffic flow � Y .The specific formula is as follows: where w is the weight parameter and b is the bias term.

Prediction results
Under the same road conditions, when the traffic flow increases, the mutual influence and competition among vehicles increase, leading to vehicle deceleration and a decrease in traffic speed.Thus, there is a negative correlation between traffic flow and traffic speed.At the same time, as traffic volume increases, the proportion of occupied lanes also increases.When traffic volume is low, there is a relatively large gap between vehicles, resulting in a lower lane occupancy rate.Thus, the two factors are positively correlated.From this, it can be seen that there is a close relationship between traffic volume, traffic speed and lane occupancy rate.Incorporating traffic speed and lane occupancy rate into the model helps improve the model's robustness.Specifically, the MFSTBiSGAT model utilizes the STBiSGAT module to separately predict future traffic volume Ŷ x , traffic speed Ŷ v and lane occupancy rate Ŷ o .It then learns the fusion impact of different components from historical traffic data, as shown in Eq (16).
where Ŷ is the predicted traffic flow after fusing the features, σ(�) is the relu(�) activation function, and � is the Hadamard product, which represents the multiplication of the corresponding elements of the matrix.

Experimental dataset
This article selected two real-world datasets, PeMSD4 and PeMSD8, as experimental data.Both datasets were collected from the Performance Measurement System (PeMS) in California.The data in these datasets are aggregated every 5 minutes.The PeMSD4 dataset contains traffic state data from road network nodes in the San Francisco Bay Area, California, while the PeMSD8 dataset contains traffic state data from road network nodes in San Bernardino County, California, USA.By conducting comparative experiments on these two datasets, this article aims to validate the effectiveness of the improved model.Detailed information is shown in Table 1.
When there are missing data points in the dataset, this article uses linear interpolation to recover the missing data.Additionally, the data is standardized using the Z-Score method.Finally, the dataset is divided into two parts: a training set and a test set, with a ratio of 8:2.

Evaluation metrics
This article uses Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) as performance metrics in the experiments.Their mathematical expressions are as follows: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 1 k where ŷi denotes the predicted value, y i denotes the true value and k denotes the number of samples.

Experimental settings
All experiments were conducted using a Win10 system with hardware configuration including an NVIDIA GeForce RTX 2080Ti GPU and 13th Gen Intel(R) Core™ i5-13400.The task of the MFSTBiSGAT model was implemented based on the PyTorch deep learning framework.In the model presented in this article, a sliding window approach is used with a window size of 12 historical time steps, equivalent to 1 hour, to predict the future 1 hour of traffic volume.After many experiments, to achieve better results, this article uses the Adam optimizer.
The initial learning rate is set to 1e-3.The batch size for training samples is set to 64.The number of iterations is set to 60.The BiLSTM has 4 layers, with 64 neurons in the input layer and 64 neurons in the hidden layer.To prevent overfitting and improve model generalization, a dropout rate of 0.3 is set.

Hyperparameter experiments
Increasing the number of attention heads K in multi-head attention can improve the performance of the model.However, having too many attention heads can lead to overfitting or excessive computation.Therefore, it is necessary to experiment and optimize the number of attention heads based on the specific task and dataset to find the optimal balance.In order to find the optimal number of attention heads K, this article conducted six sets of comparative experiments with K as the only variable, and the results are shown in Fig 4 .It is evident from the figure that the MFSTBiSGAT model achieves the lowest MAE and RMSE values when K = 5 on both datasets.This suggests that the model can effectively extract features from multiple subspaces, thereby enhancing its expressive power.Therefore, in order to improve the accuracy and expressive power of the model, the attention head number K of GAT is set to 5, and the hidden layer dimension is set to 32.

Experimental results and analysis
In the experiments, this article conducted performance validation on the PeMSD4 and PeMSD8 datasets.According to the results shown in Table 2, the MFSTBiSGAT model performs the best on both datasets.In comparison, models such as HA, BPNN and LSTM that only consider temporal features perform poorly because they overlook the spatial characteristics of the road network.Similarly, the GCN model that only considers spatial features also has certain limitations.Therefore, combining temporal and spatial features is essential for accurate traffic flow prediction.In this regard, the GCN-BiLSTM model successfully integrates the GCN network into the BiLSTM architecture, effectively capturing the complex spatial dependencies between road segments and the temporal patterns of traffic flow.As a result, it improves the prediction accuracy.
However, the MFSTBiSGAT model performs better in terms of prediction accuracy, and the specific experimental results are shown in   Compared to the DSTAGNN model, only one metric is larger, while the other three metrics are lower.It can be seen that the MFSTBiSGAT method in this paper fully considers the spatio-temporal dynamic features, effectively captures the evolutionary trends of nodes, and thus describes the behavior of nodes in the spatio-temporal domain.

Ablation analysis
To further investigate the superiority of our model in this study, we conducted ablation experiments for comparison.Three variants based on the MFSTBiSGAT model are designed and compared to predict the change in traffic flow error for different future time steps (5min-60min), the details of which are given below.As shown in Fig 6, based on the results of the evaluation metrics in the figure, the MFSTBiS-GAT model exhibits the best predictive performance.It can be observed that the spatial information enhancement layer, temporal information enhancement layer, and other features are all important for the predictions of this model.Due to the complex nonlinear relationships between nodes in spatial domain, the MFSTBiSGAT-S model, which removes the module for capturing these hidden correlations, exhibits relatively lower performance.Traffic flow exhibits significant temporal dynamics, and the MFSTBiSGAT-T model, due to the loss of some temporal features during the early extraction of spatial features, is unable to accurately capture the original temporal characteristics of the raw traffic flow.As a result, the model exhibits the poorest predictive performance.The MFSTBiSGAT-MF model, due to the removal of other features, exhibits reduced robustness, leading to a slight decrease in overall predictive performance.From the trends of the evaluation metrics in the figure, it is evident that as the prediction horizon increases, the errors of all models gradually increase.

Conclusion
In response to the issue of existing models not effectively exploring the spatiotemporal features of traffic flow, we propose the MFSTBiSGAT model for traffic flow prediction based on graph deep learning framework.This model combines the advantages of GAT and BiLSTM.It can extract the spatial dynamic features of the traffic network and the temporal evolution trends of traffic flow from both the past and future directions.In addition, to explore spatio- temporal features in depth, the model captures hidden correlations between nodes in spatial feature extraction, extracts the original temporal features of the traffic network in the temporal dimension, and incorporates closely related features as traffic speed and lane occupancy, which are closely related to traffic flow, through Hadamard product and concatenation operations.This effectively enhances the expressive power of spatiotemporal information in the model.
After conducting numerous experiments on two real-world datasets, the MFSTBiSGAT model has demonstrated significantly better predictive accuracy compared to other baseline models.Compared to recent spatiotemporal prediction models, MFSTBiSGAT achieves a reduction in MAE values ranging from 7.69% to 21.69% and a reduction in RMSE values ranging from 3.25% to 16.19% on the PeMSD4 dataset.Results from ablation experiments show that the MFSTBiSGAT model outperforms models without the spatiotemporal information enhancement layer and multiple features, further confirming the effectiveness of the added modules.
However, there are some of the limits of our study.Future traffic flow prediction models will also consider additional factors such as weather conditions, holidays, and other relevant factors.These factors have a significant impact on traffic conditions and often exhibit sudden or periodic patterns.Therefore, incorporating these factors into the model is meaningful for improving the accuracy of predictions.

Fig 3 .
Fig 3. Heat map of node relationships.https://doi.org/10.1371/journal.pone.0306892.g003 where wxh , whh , w ~ x h and w ~ h h are the weight parameters of the model, bh and b ~ h are the bias terms of the model, and ϕ is the tanh activation function.

Fig 5 .
Compared to the GCN-BiLSTM model, on the PeMSD4 dataset, MFSTBiSGAT resulted in a 20.86% and 16.19% reduction in MAE and RMSE, respectively.In addition, MFSTBiSGAT outperforms another model, MSTGCN,

Fig 4 .
Fig 4. Effect of attentional head count on evaluation indicators.(a)The effect of the number of attention heads on MAE.(b)The effect of the number of attention heads on RMSE.https://doi.org/10.1371/journal.pone.0306892.g004 a. MFSTBiSGAT-S: The spatial information enhancement layer has been removed.b.MFSTBiSGAT-T: The temporal information enhancement layer has been removed.c.MFSTBiSGAT-MF: The features of traffic speed and lane occupancy have been removed.

Fig 6 .
Fig 6.Prediction errors for each model at different time steps on the two datasets.The MAE and RMSE errors on the PeMSD4 dataset are denoted as (a) and (b), respectively.The MAE and RMSE errors on the PeMSD8 dataset are denoted as (c) and (d), respectively.https://doi.org/10.1371/journal.pone.0306892.g006

Table 2 . Comparison of performance metrics for different models.
considers both temporal and spatial features, resulting in 21.69% and 16.17% reduction in MAE and RMSE, respectively.Compared to GAT-GRU, the proposed STBiSGAT model reduces the MAE and RMSE by 16.51% and 14.39% respectively.Compared to ASTGCN, the proposed STBiSGAT model reduces the MAE and RMSE by 10.96% and 7.75% respectively.Compared to STGODE, MFSTBiSGAT reduces the MAE and RMSE by 7.69% and 3.25% respectively. which